\section{Introduction}
\label{sec:intro}

We present various optimization techniques for matrix multiplication of arbitrarily sized square matrices.  These optimization techniques were targeted for the National Energy Research Scientific Computing Center's Franklin cluster, a Cray XT4 massively parallel processing system consisting of approximately 38K processing cores.  We implement and evaluate a subset of these techniques on Franklin as well as on FIXME for comparison.  We then provide an analysis and discussion of the performance improvements observed.
